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1. Introduction . The form of the Bayes estimate of the population 
mean with respect to a Dirichlet prior with parameter a has given rise to 
the interpretation that a(iQ is the prior sample size. Furthermore, if 
a(X) is made to tend to zero, then the Bayes estimate mathematically con¬ 
verges to the classical estimator, namely the sample mean. This has 
further given rise to the general feeling that allowing o(X) to become 
small not only makes the 'prior sample size' small but also that it 
corresponds to no prior information. By investigating the limits of 
prior distributions as the parameter a tends to various values, we show 
that it is misleading to think of o(X) as the prior sample size and the 

V 

smallness of a(X) as no prior information. In fact very small values of 

a(X) actually mean that we have very definite information concerning the 
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2. The Dirichlet measure . Let (X, A) be separable metric space 0i3 tr^“‘ code 


endowed with the corresponding Borel o-field. Let P and M be the 


A'.- 


jrtvi/ or 


of probability measures and finite measures (countably additive) on 


«.sV 


'pocvrA 


■OV*» 


» 
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(X, A). The natural o-field, c(P), on P is the smallest o-field in P 
such that the function P I—> P(A) is measurable for each A in A. Therfe' ' 
is also the notion of weak convergence in both P and M, namely, % a 
if and only if fgda^ f gda for all bounded continuous functions on X. 

Under this convergence P becomes a separable complete metric space 
(Prohorov [4]) and the o-field o(P) above is the Borel o-field in P. 

To each non-zero measure a in M, we denote by a the corresponding normalized 
measure, namely a(A) * a(A)/o(X), A e A. 

In non-parametric Bayesian analysis, the 'true 1 probability measure 
P takes values in P, is random and has a prior distribution. To facilitate 
the use of standard probability theory we must view P as a measurable map 


from some probability space (0, S, Q) into (P, o(P)) and the induced 


,-l 


measure QP becomes the prior distribution. For any non-zero measure o 


in M, the Dirichlet prior measure D a with parameter o, is defined as 


follows (Ferguson [3]): For any finite measurable partition (Aj, .A^) 
of X, the distribution of (P(Aj), .... P(A^)) under D q is the singular 
Dirichlet distribution D(a(A^), a(A^)) defined on the k-dimensional 

simplex as in Wilks [7] Section 7.7. Ferguson [3] used this definition 
and also an alternate definition (See Theorem 1 of Ferguson [3]), and 
derived many properties of Dirichlet priors and the corresponding Bayes 
estimates of population parameters. Blackwell [1] and Blackwell and 
MacQueen [2] have also given alternative definitions of the Dirichlet 
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prior. We give below yet another definition of the Dirichlet prior which 
is more general than the previous ones since we will not have to assume 
that X is separable metric. Let a be a non-zero measure in M. Let 
(ft, S, Q) be a probability space rich enough to support two independent 
sequences of i.i.d. random variables Yj, Y 2> ... and 9^, e 2 , .... where 
is X-valued and has distribution a and is real valued and has a 
Beta distribution with parameters 1 and o(X). Let Pj ■ 0j, p 2 ■ © 2 (l-Oj), 
P 3 * Q 3 (1-0 j)( l-® 2 ), .... For any y in X let stand for the degenerate 
probability measure at y. Define the measurable map P from (Q, S) into 
(P, o(P)) as follows: 

00 

P(A) « l p.6 v (A). (1.1) 

j=l 3 j 

Then the induced distribution of P is the Dirichlet measure D with 

a 

parameter a. The proof of this fact and that the standard properties 
of Dirichlet measures can be deduced from this will be given elsewhere, 
Sethuraman [5]. 

In the statistical problem of non-parametric Bayesian analysis we 
have a random variable P taking values in P and whose distribution is D q . 

We also have a sample , .... X R , which are random variables taking values 
in X. Given P, these are i.i.d. with common distribution P. It is re- 

f a 

quired to estimate a function 4(P), and the Bayes estimator 4 with respect 
to squared loss is given by 

EWmlXj, ..., X n ). 

In particular, if $(P) ■ ♦ (P) where 

O 
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♦ g (P) - /g(x)P(dx) 


( 1 . 2 ) 


2 

where g is a real valued measurable function on X with /g aa < », then the 


Bayes estimate is given by 


a(X)/gda ♦ n/gdFr 


(1.3) 


where F r is the empirical d.f. of X^, .... (Ferguson [3]). In this if 
we let a(X) ■> 0 we obtain the classical estimate /gdF R . Also the denominator 
in this estimate is a(X) ♦ n which is a(X) plus the sample size. These 
facts have given rise to the interpretation that a(X) is the prior sample 
size and allowing a(X) to tend to zero corresponds to no prior information. 

In the next section we investigate what happens to Dirichlet measures 
when their parameters are allowed to converge to certain values. In 
section 4 we investigate what happens to Bayes estimates when the parameters 
of the corresponding Dirichlet priors are allowed to converge to the zero 
measure. From the results in these two sections it follows that small 
values of a(X) actually correspond to certain definitive information about 


3. Convergence of Dirichlet measures. In this section we study 
the convergence of Dirichlet measures as their parameter is allowed to 
converge in appropriate ways. Since (P, a(P)) is a separable complete 
metric space endowed with its Borel o-field, we can talk about the usual 
weak convergence of probability measures on (P, o(P)) and of Dirichlet 
measures, in particular. 
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THEOREM 3.1. Let (i } be a sequence of measures in M and let the 
sequence of normalized measures {a r > be tight. Then the sequence {0 q } of 
Dirichlet measures is tight. 

PROOF. Fix e > 0. There exists a sequence of compact sets in X 
such that 


sup <* r ( K j) * 6e/d 3 ir 2 , 


(3.1) 


d * If 2, .... Let 


M d * (P: P(K®) s l/d). 


(3.2) 


d * 1, 2, ... and let 


M » n M.. 
d d 


(3.3) 


Then clearly M is a compact subset of P in the weak topology. Now, by the 
Chebysheff inequality 


D q (Mj) SdEp (P(K*)) - d a r (Kj) S 6e/ir 2 d 2 
r a r 


(3.4) 


D (M c ) £ £ 6e/ir 2 d 2 ■ e, for all r. 
r d 


(3.5) 


This proves that (D o } is tight. □ 


THEOREM 3.2. Let (o r > be a sequence of measures in M such that 


sup |a r (A) - o q (A) | -*■ 0 
A 


(3.6) 


where a is a non-zero measure in M. Then D_ converges to D weakly. 
° ®r a o 
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PROOF. The proof of this result rests heavily on the constructive 
definition of the Dirichlet measure in (1.1) and the following result 
which is proved in Sethuraman [6]. 

Let (8^) be a sequence of probability measures on an arbitrary measurable 
space (/, 8) and let 

sup |B r (B) - 8 o (B)| -► 0, (3.7) 

B 

where 8 q is a probability measure on (/, 8). Then there exists a sequence 

of /-valued random variables (Y )" with marginal distributions (8 }°° such 

r o r o 

that 

Prob. (Y f Y } -*■ 0 as r ♦ «. (3.8) 

r o 

From (1.1) and the abov<; result, we can find independent sequences of i.i.d. 
random variables {Y*}, {0*}, r » 0, 1, 2, ... such that the distribution 
of Yj is a , the distribution of 9^ is Beta with parameters 1 and (X), 
r e 0, 1, .... and 

Prob. (Y? + Y°) ♦ 0 (3.9) 

and 

Prob. (0* + 0?) + 0 as r+ •, j « 1, 2. (3.10) 

Furthermore, if p* ■ 0j, pT ■ 0^(l-0j_j) ... (1-0*) for j 2 1, and 

m 

P r (A) « I pU _(A), 

j-1 3 y* 


(3.11) 
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then the distribution of P r is the Dirichlet measure D , r » 0, 1. 

“r 

From (3.11) it can be easily shown that, for any integer m. 


sup | P r (A) - P°(A) | £ f |p' - p?| ♦ l I(Y' t Y°) 

A j=l 3 3 j=l 3 3 

(3.12) 

♦ 2 7T d-o?) ♦ TTu-eJ)- 

j=i 3 j*i 3 


From the construction above and (3.8), (3.9) and (3.12) and by first 

choosing m appropriately and then allowing r to tend to ® that 

sup |P r (A) - P°(A)| -► 0 in probability which is a stronger assertion than 
A 

made in the theorem, namely that D D weakly. □ 

“r “o 


THEOREM 3.3. Let {a^} be a sequence of measures in M such that 

a y (X) •> 0 and sup |a r (A) - a o (A)| + 0 as r + •, (3.13) 

A 

where o is a probability measure in P. Then the measures D q converge to a 
° r 

random degenerate measure 6 where Y° has distribution o . 

yO o 

PROOF. As before we can construct independent sequences of i.i.d. 
random variables {Y?} and (o'), and an independent random variable Y°, 
such that y' has distribution Y° has distribution o q , the distribution 
of o' is Beta with parameters 1 and <* r (X), r = 1, 2, ..., and 

Prob. (y' j* Y°) + 0 as r ». (3.14) 

Furthermore, if p' ■ o', p^ * 0?(1-Qj_j) ••• (l-9j)> for j * 1. and 
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eo 

P r (A) * l P*6 (A), 

j»i J r. 


(3.15) 


then the distribution of P is the Dirichlet measure with parameter a^, 
r = 1, 2, .... 

From (3.15), it is easily seen that 

sup !P r (A) - 6 (A)| S I(Y* * Y°) ♦ 2(l-p^). (3.16) 

A Y° 1 1 

From (3.14) and the fact that a f (X) 0, it follows that 

sup |P r (A) - 6 (A) j -*■ 0 in probability which again is stronger than the 

A Y° 

assertion of the theorem. 0 


From Theorem 3.2 it is clear that allowing a r (X) to tend to zero does 
not correspond to no information on P. In fact if a r (X) ■+• 0 and the nor¬ 
malized measure u ' converges in the strong sense of (3.13) to a probability 
measure a Q , then the information about P is that it is a probability measure 
concentrated at a particular point in X which is chosen at random according 
to a Q . This is definitely very strong information about P and most probably 
not of the type any statistician would be willing to make. 

4. Convergence of Bayes estimates . In this section we are mainly 
interested in the limits of Bayes estimates of various function $(P) as 
o(X) ■+• 0. We will therefore make the following assumption throughout this 


section: 


<* r (X) 0 and sup |a r (A) - a Q (A)| 0, 

A 


(4.1) 


where a is a probability measure in P. We will also be mainly concerned 
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with a special class of functions $(P) as defined below. Let g be a 

. k l 

permutation invariant measurable function from X into R such that 


/ | »* • • * Xj* 11 * x 2 * * • * x m * • • • I da (Xj) ... da (x^) ^ ® (4 • 2) 

for all possible combinations of arguments (Xj, ..., Xj, x 2 » .... x 2 , ...» 
x m’ **•» x m ^ from a11 distinct (m a k) to all identical (m » 1). When 
the function g vanishes whenever any two coordinates are equal, condition 
(4.2) reduces to the simple condition 


/ 1S( x j» •••» **)! da(Xj) ... da(x^) < 


(4.3) 


Define the parametric function 


♦ g (P) = / g( x r x o )dP(x 1 ) ... dP(x k ) 


(4.4) 


for all those P’s for which it exists. Let P have D q as the prior distri¬ 
bution and let (Xj, ...» X r ) be a sample from P. Under further assumptions 

v 

concerning the second moment of g under a , the Bayes estimate (with respect 


to squared error loss) of 4 (P) based on the sample is 

8 

?n 


*g,a = W P),X 1.V’ 


(4.5) 


and based on no sample is 


♦g.a ' VV P)) ' (4.6) 

Since the conditional distribution of P given (X., .... X ) is D 

1 n a*nF * 

n 

where F n is the empirical distribution function of (Xj, .... X n ), we have 

(4.7) 


;n , :o 
*8.<» *g,a*nF n * 


Suppose that we substitute a - a r where (a r ) satisfies (4.1). From the 
results of section 3 we know that 







and 


D -*-6 weakly, 
a „o 1 * 


(4.8) 


D a +nF * D nF 
r n r 


(4.9) 


as r + ■• The main result of this section pertains to the convergence of 
the Bayes estimates $° and #° A e . 

g»« r s»V nF n 


THEOREM 4.1. Let condition (4.1) hold. Let g be a continuous function 

from X* into R 1 . Let g(x,, ..., x t , x_, ..., x,,, ..., x_ .... x ) be 

i i c l mm 

uniformly integrable with respect to o^, for all combinations of arguments 
(Xj, .... Xj, x 2 , x 2 , .... x ffl , .... x ffl ) from all distinct to all 

identical. Then 


f 8(X * x)d “o (x) 


(4.10) 


and 


g,nF - e d„. ( * (z l . V> 


«-V" F n .n “r,F 


(4.U) 


where (Z^, .... Z^) is a sample from P where P has the distribution D 


nF 


PROOF. The easiest way to prove this result is to use the repre¬ 
sentation (1.1) for the random probability measure P with a Dirichlet 
distribution. The uniform integrability conditions on g with respect to 
« r immediately show that $ (P r ) is uniformly integrable with respect to D 


since it is the convex combination of uniformly integrable functions as 
given below: 
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v pI) 








where Y*, ... are i.i.d. with common distribution a This fact and (4.8) 
and (4.9) establish the results (4.10) and (4.11) of the theorem. 0 


The results of this theorem generalize those of Ferguson [3] Section 
5b and 5e and Yamato [8], [9]. Also when g(x 1> .... x^) is such that it 
vanishes whenever two coordinates arc equal, it is easy to see that 




where U is the usual U statistic based on g and the sample (X., ..., X ). 
g,n x n 

This result is also contained in Yamato [8], [9]. 
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